Search CORE

11 research outputs found

Exploring the Limitations of Behavior Cloning for Autonomous Driving

Author: Codevilla Felipe
Gaidon Adrien
López Antonio M.
Santana Eder
Publication venue
Publication date: 18/04/2019
Field of study

Driving requires reacting to a wide variety of complex environment conditions and agent behaviors. Explicitly modeling each possible scenario is unrealistic. In contrast, imitation learning can, in theory, leverage data from large fleets of human-driven cars. Behavior cloning in particular has been successfully used to learn simple visuomotor policies end-to-end, but scaling to the full spectrum of driving behaviors remains an unsolved problem. In this paper, we propose a new benchmark to experimentally investigate the scalability and limitations of behavior cloning. We show that behavior cloning leads to state-of-the-art results, including in unseen environments, executing complex lateral and longitudinal maneuvers without these reactions being explicitly programmed. However, we confirm well-known limitations (due to dataset bias and overfitting), new generalization issues (due to dynamic objects and the lack of a causal model), and training instability requiring further research before behavior cloning can graduate to real-world driving. The code of the studied behavior cloning approaches can be found at https://github.com/felipecode/coiltraine

arXiv.org e-Print Archive

Diposit Digital de Documents de la UAB

On Offline Evaluation of Vision-based Driving Models

Author: Codevilla Felipe
Dosovitskiy Alexey
Koltun Vladlen
López Antonio M.
Publication venue
Publication date: 01/01/2018
Field of study

Autonomous driving models should ideally be evaluated by deploying them on a fleet of physical vehicles in the real world. Unfortunately, this approach is not practical for the vast majority of researchers. An attractive alternative is to evaluate models offline, on a pre-collected validation dataset with ground truth annotation. In this paper, we investigate the relation between various online and offline metrics for evaluation of autonomous driving models. We find that offline prediction error is not necessarily correlated with driving quality, and two models with identical prediction error can differ dramatically in their driving performance. We show that the correlation of offline evaluation with driving quality can be significantly improved by selecting an appropriate validation dataset and suitable offline metrics. The supplementary video can be viewed at https://www.youtube.com/watch?v=P8K8Z-iF0cYComment: Published at the ECCV 2018 conferenc

arXiv.org e-Print Archive

Crossref

Diposit Digital de Documents de la UAB

End-to-end Driving via Conditional Imitation Learning

Author: Codevilla Felipe
Dosovitskiy Alexey
Koltun Vladlen
López Antonio
Müller Matthias
Publication venue
Publication date: 02/03/2018
Field of study

Deep networks trained on demonstrations of human driving have learned to follow roads and avoid obstacles. However, driving policies trained via imitation learning cannot be controlled at test time. A vehicle trained end-to-end to imitate an expert cannot be guided to take a specific turn at an upcoming intersection. This limits the utility of such systems. We propose to condition imitation learning on high-level command input. At test time, the learned driving policy functions as a chauffeur that handles sensorimotor coordination but continues to respond to navigational commands. We evaluate different architectures for conditional imitation learning in vision-based driving. We conduct experiments in realistic three-dimensional simulations of urban driving and on a 1/5 scale robotic truck that is trained to drive in a residential area. Both systems drive based on visual input yet remain responsive to high-level navigational commands. The supplementary video can be viewed at https://youtu.be/cFtnflNe5fMComment: Published at the International Conference on Robotics and Automation (ICRA), 201

arXiv.org e-Print Archive

Crossref

Diposit Digital de Documents de la UAB

Scaling Vision-based End-to-End Driving with Multi-View Attention Learning

Author: Codevilla Felipe
Lopez Antonio M.
Porres Diego
Xiao Yi
Publication venue
Publication date: 22/07/2023
Field of study

On end-to-end driving, human driving demonstrations are used to train perception-based driving models by imitation learning. This process is supervised on vehicle signals (e.g., steering angle, acceleration) but does not require extra costly supervision (human labeling of sensor data). As a representative of such vision-based end-to-end driving models, CILRS is commonly used as a baseline to compare with new driving models. So far, some latest models achieve better performance than CILRS by using expensive sensor suites and/or by using large amounts of human-labeled data for training. Given the difference in performance, one may think that it is not worth pursuing vision-based pure end-to-end driving. However, we argue that this approach still has great value and potential considering cost and maintenance. In this paper, we present CIL++, which improves on CILRS by both processing higher-resolution images using a human-inspired HFOV as an inductive bias and incorporating a proper attention mechanism. CIL++ achieves competitive performance compared to models which are more costly to develop. We propose to replace CILRS with CIL++ as a strong vision-based pure end-to-end driving baseline supervised by only vehicle signals and trained by conditional imitation learning.Comment: This paper has been accepted to the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023

arXiv.org e-Print Archive

Autobots: Latent Variable Sequential Set Transformers

Author: Codevilla Felipe
D'Souza Jim Aldon
Girgis Roger
Golemo Florian
Heide Felix
Kahou Samira Ebrahimi
Pal Christopher
Weiss Martin
Publication venue
Publication date: 16/06/2021
Field of study

Robust multi-agent trajectory prediction is essential for the safe control of robots and vehicles that interact with humans. Many existing methods treat social and temporal information separately and therefore fall short of modelling the joint future trajectories of all agents in a socially consistent way. To address this, we propose a new class of Latent Variable Sequential Set Transformers which autoregressively model multi-agent trajectories. We refer to these architectures as "AutoBots". AutoBots model the contents of sets (e.g. representing the properties of agents in a scene) over time and employ multi-head self-attention blocks over these sequences of sets to encode the sociotemporal relationships between the different actors of a scene. This produces either the trajectory of one ego-agent or a distribution over the future trajectories for all agents under consideration. Our approach works for general sequences of sets and we provide illustrative experiments modelling the sequential structure of the multiple strokes that make up symbols in the Omniglot data. For the single-agent prediction case, we validate our model on the NuScenes motion prediction task and achieve competitive results on the global leaderboard. In the multi-agent forecasting setting, we validate our model on TrajNet. We find that our method outperforms physical extrapolation and recurrent network baselines and generates scene-consistent trajectories.Comment: 21 pages, 15 figures, 5 table

arXiv.org e-Print Archive

PolyPublie

On building end-to-end driving models through imitation learning

Author: Codevilla Moraes Felipe
Universitat Autònoma de Barcelona. Departament de Ciències de la Computació
Publication venue
Publication date: 01/01/2019
Field of study

Els vehicles autònoms es consideren ara com a actius assegurats en el futur. Literalment, tots els marcadors d'automòbils rellevants es troben en una cursa per produir vehicles totalment autònoms. Aquests fabricants de cotxes solen fer ús de canonades modulars per al disseny de vehicles autònoms. Aquesta estratègia descompon el problema en diverses tasques com la detecció i el reconeixement d'objectes, la segmentació semàntica i la instància, l'estimació de profunditat, el reconeixement de llocs i SLAM, així com la planificació i el control. Cada mòdul requereix un conjunt separat d'algoritmes experts, que són costosos especialment quant al treball humà i la necessitat d'etiquetatge de dades. Una alternativa que recentment té un interès significatiu és la conducció integral. En el paradigma de conducció de extrem a extrem, la percepció i el control s'obtenen simultàniament mitjançant una xarxa profunda. Els models de tesisensorotor s'obtenen normalment mitjançant l'aprenentatge de imitacions de les demostracions de humà. L'avantatge principal és que aquest enfocament pot aprendre directament de les grans flotes de vehicles dirigits per humans sense necessitat d'un ontologia fixa i d'una àmplia quantitat d'etiquetatge. No obstant això, els mètodes de extrem a extrem es van utilitzar habitualment per aprendre conductes simples com ara manteniment de carrils i el vehicle principal. En aquesta tesi, per tal d'aconseguir comportaments més complexos, abordemalguns problemes quan es crea un sistema de conducció de extrem a extrem mitjançant l'aprenentatge de la imitació. El primer d'aquests és la necessitat d'un entorn per a l'avaluació d'algorismes i la recopilació de demostracions d'administració. En aquest sentit, hem participat en la creació del simulador de Carla, una plataforma de codi obert construïda des de la base per a la validació i el prototipatge d'autònoms. Atès que l'enfocament de extrem a extrem és purament re-actiu, també hi ha la necessitat de proporcionar una interfície amb un sistema de planificació global. Amb això, proposem l'aprenentatge d'imitació condicional que condiciona les accions produïdes en algun comandament d'alt nivell. L'avaluació és també una qüestió i normalment es fa mitjançant la comparació de la sortida de la xarxa de cap a cap a un conjunt de dades de conducció que es recull. Demostrem que això és correlacionat sorprenentment debilitat amb la conducció real i proposa estratègies sobre com adquirir millor les dades i una estratègia de comparació millor. Finalment, confirmem problemes de generalització ben coneguts (deguts a biaixos i sobraccessos actuals), de nous (a causa d'objectes dinàmics i la manca de model acausal) i la inestabilitat de la formació; Els problemes que requereixen més investigacions abans de finalitzar la conducció a través de la imitació poden escalar a la conducció del món real.Autonomous vehicles are now considered as an assured asset in the future. Literally, all the relevant car-markers are now in a race to produce fully autonomous vehicles. These car-makers usually make use of modular pipelines for designing autonomous vehicles. This strategy decomposes the problemin a variety of tasks such as object detection and recognition, semantic and instance segmentation, depth estimation, SLAM and place recognition, as well as planning and control. Each module requires a separate set of expert algorithms, which are costly specially in the amount of human labor and necessity of data labelling. An alternative that recently has driven considerable interest is the end-to-end driving. In the end-to-end driving paradigm, perception and control are learned simultaneously using a deep network. These sensorimotor models are typically obtained by imitation learning from human demonstrations. The main advantage is that this approach can directly learn from large fleets of human-driven vehicles without requiring a fixed ontology and extensive amounts of labeling. However, scaling end-to-end driving methods to behaviors more complex than simple lane keeping or lead vehicle following remains an open problem. On this thesis, in order to achieve more complex behaviours, we address some issues when creating end-to-end driving system through imitation learning. The first of themis a necessity of an environment for algorithm evaluation and collection of driving demonstrations. On this matter, we participated on the creation of the CARLA simulator, an open source platformbuilt from ground up for autonomous driving validation and prototyping. Since the end-to-end approach is purely reactive, there is also the necessity to provide an interface with a global planning system. With this, we propose the conditional imitation learning that conditions the actions produced into some high level command. Evaluation is also a concern and is commonly performed by comparing the end-to-end network output to some pre-collected driving dataset. We show that this is surprisingly weakly correlated to the actual driving and propose strategies on how to better acquire data and a better comparison strategy. Finally, we confirmwell-known generalization issues (due to dataset bias and overfitting), new ones (due to dynamic objects and the lack of a causal model), and training instability; problems requiring further research before end-to-end driving through imitation can scale to real-world driving

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Diposit Digital de Documents de la UAB

Multimodal end-to-end autonomous driving

Author: Codevilla Moraes Felipe
Gurram Akhil
López Peña Antonio M.
Urfalioglu Onay
Xiao Yi
Publication venue
Publication date: 30/01/2024
Field of study

Altres ajuts: Antonio M. Lopez acknowledges the financial support by ICREA under the ICREA Academia Program. We also thank the Generalitat de Catalunya CERCA Program, as well as its ACCIO agencyA crucial component of an autonomous vehicle (AV) is the artificial intelligence (AI) is able to drive towards a desired destination. Today, there are different paradigms addressing the development of AI drivers. On the one hand, we find modular pipelines, which divide the driving task into sub-tasks such as perception and maneuver planning and control. On the other hand, we find end-to-end driving approaches that try to learn a direct mapping from input raw sensor data to vehicle control signals. The later are relatively less studied, but are gaining popularity since they are less demanding in terms of sensor data annotation. This paper focuses on end-to-end autonomous driving. So far, most proposals relying on this paradigm assume RGB images as input sensor data. However, AVs will not be equipped only with cameras, but also with active sensors providing accurate depth information (e.g. , LiDARs). Accordingly, this paper analyses whether combining RGB and depth modalities, i.e. using RGBD data, produces better end-to-end AI drivers than relying on a single modality. We consider multimodality based on early, mid and late fusion schemes, both in multisensory and single-sensor (monocular depth estimation) settings. Using the CARLA simulator and conditional imitation learning (CIL), we show how, indeed, early fusion multimodality outperforms single-modality

Diposit Digital de Documents de la UAB